364
Chapter 8
ptimised Peptide Pattern Discovery
hen analysing the patterns of protease cleavage sites or
sttranslational modification sites based on peptide data, a
y question is whether it is possible to discover interpretable
d explainable as well visible rules by which how peptides
classified can be well-understood. A linear model benefits
etter interpretation between experimental data sets such as
ptides and peptide labels. However, the relationship
tween peptides used in either protease cleavage pattern
covery or posttranslational modification pattern discovery
d peptide labels may not always be simple. Moreover,
ptides are non-numerical data. On the other hand, most
nlinear models such as neural network models do not offer
fficient insight into data. The decision-tree algorithms or
random forest algorithms are capable of providing a better
erpretation to a model. However, in order to discover the
timal models, an expensive exhaustive enumeration has to
considered. This is why the evolutionary computation
proaches have provided a better way and have been well-
mployed in many areas for generating optimal or near
timal models with a better interpretation capability. This
apter will introduce a different type of machine learning
proaches for this kind of biological pattern discovery. It is
genetic programming algorithm, which is one type of the
olutionary computation approaches. This chapter will
roduce how the genetic programming algorithm can be
ed for discovering the interpretable rules for a peptide data